Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed: Default app data location #4758

Merged
merged 2 commits into from
Oct 10, 2023
Merged

zed: Default app data location #4758

merged 2 commits into from
Oct 10, 2023

Conversation

mattnibs
Copy link
Collaborator

@mattnibs mattnibs commented Sep 1, 2023

If no lake location is specified, zed will use a per-os default
location. Also if the default location is used, zed will first check if
there's a service on 9867 and opt to connect to that instead of
operating locally.

@mattnibs mattnibs requested a review from a team September 1, 2023 00:14
@mattnibs mattnibs force-pushed the default-datadir branch 2 times, most recently from db0eb5d to e4103dd Compare September 6, 2023 21:31
@mattnibs mattnibs marked this pull request as ready for review September 6, 2023 21:32
@philrz
Copy link
Contributor

philrz commented Sep 26, 2023

I was reviewing the paths proposed in this PR relative to some of the research I'd done at the request of the team when writing up #3956. I'll start by saying that I imagine any set of paths could probably be workable. However, to the degree we want to use this opportunity to be in line with prevailing conventions, we might want to think about some adjustments. Here my thoughts on each platform.

Linux


The changes in this PR show the Zed tools now leveraging $XDG_DATA_HOME, so to the degree that we might want to go all the way with XDG rather than just cherry pick, there's more in the XDG Base Directory Specification we might want to consider.

  1. The XDG spec says that:

    If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

    Meanwhile, the changes in this PR have the lake ending up under $HOME/.zed in that case.

  2. It's not directly in scope of this specific PR, but while I'm at it I feel I should point out that once we're using XDG conventions for some files created by Zed tools, we might want to use that opportunity to do the same with other paths. For instance, the XDG spec says:

    $XDG_CONFIG_HOME defines the base directory relative to which user-specific configuration files should be stored. If $XDG_CONFIG_HOME is either not set or empty, a default equal to $HOME/.config should be used.

    So for instance we currently use $HOME/.zed as the home to hold the credentials.json that's created after a successful zed auth login. I suspect that kind of credential fits the definition of "user-specific configuration files", so we may want to consider using that environment variable and fallback path. It should also be noted that the proposed use of $HOME/.zed in this PR effectively clashes with the current use of $HOME/.zed for this credential, which would be another thing we'd probably want to avoid (i.e., having client config inside the directory with lake data).

    Also from the spec:

    $XDG_STATE_HOME defines the base directory relative to which user-specific state files should be stored. If $XDG_STATE_HOME is either not set or empty, a default equal to $HOME/.local/state should be used.

    The .zed_head file that's created after zed use runs successfully may meet the definition of "user-specific state files" (or perhaps it's another example of "user-specific configuration"?) so we may want to rethink this one as well.

    The spec speaks of other possible paths for certain uses, and I suspect there may be other files created by the Zed tooling that I'm not thinking of. A wider audit of the code might be justified if we want to cover all our bases at once.

macOS


We discussed this one as a team. At first I was in favor of the way this PR would currently have Zed using a directory ~/Library/Application Support/zed. However, after we discussed the use of XDG on Linux, a couple points were made:

  1. @mattnibs has recently observed some tools that follow the XDG conventions even on macOS.
  2. While the ~/Library/Application Support directory is indeed used by lots of macOS software today, it was pointed out that these appear to be mostly/all GUI-based apps rather than CLI tools. Most CLI tools still default to using something below the home directory.

These two things together make a compelling case for leveraging the XDG variables and fallback paths even on macOS.

Windows


A comment in this PR mentions checking both LOCALAPPDATA and APPDATA because Windows XP and older releases didn't have LOCALAPPDATA. This motivation seems suspect because Windows XP was EOL as of 2014. Instead I think we should assume both variables be available and then consistently leverage the one that makes the most sense for Zed's needs.

An example of these paths from my personal Windows 10 system are that my LOCALAPPDATA defaults to C:\Users\Phil\AppData\Local and my APPDATA defaults to C:\Users\Phil\AppData\Roaming. There's lots of resources online that talk about the common uses for "Local" vs. "Roaming", but this video does a nice job of summarizing.

My take-away is that we could probably rationalize either option. The advantage of putting the Zed lake below "Roaming" (so, leverage APPDATA) would be that in a properly-configured Windows domain the user would have the benefit of their lake data automatically being available to them each time they login to a different system. And the motivation to put the Zed lake below "Local" (so, leverage LOCALAPPDATA) is basically the opposite side of the same coin: As mentioned around minute 7 of the video, software that creates large amounts of data (which a Zed lake can certainly be!) will often put its data below "Local" precisely because replicating that data in a roaming environment could be seen as heavyweight and undesirable. For this reason I suspect LOCALAPPDATA gets the edge, so the spirit of what's in the PR seems ok to me (though I might be inclined to drop the fallback to APPDATA since as mentioned previously we should be able to rely on LOCALAPPDATA existing.)

FWIW, per the Zui docs, Zui happens to currently leverage APPDATA for it stored data, including the lake it currently creates. But in the direction things are headed it seems like Zui is likely to adapt to Zed's new behavior, so I'm just pointing this out.

cli/lakeflags/datadir.go Outdated Show resolved Hide resolved
@mattnibs mattnibs changed the base branch from main to use-no-args September 29, 2023 14:04
Base automatically changed from use-no-args to main September 29, 2023 14:13
@mattnibs mattnibs force-pushed the default-datadir branch 4 times, most recently from e6afe28 to 98e2802 Compare October 2, 2023 16:45
@philrz
Copy link
Contributor

philrz commented Oct 2, 2023

@mattnibs: I'd remembered you indicating this branch was maybe finished, so I just did a retest of the branch at e6afe28 and it's still not making sense to me. For instance If I don't have ZED_LAKE or XDG_DATA_HOME set, I see:

$ zed -version
Version: v1.10.0-2-ge6afe283

$ zed serve
serve command available for local lakes only

$ zed init
lake created: /Users/phil/.local/share/zed

$ zed serve
serve command available for local lakes only

$ zed ls
Post "http://localhost:9867/query?ctrl=T": dial tcp [::1]:9867: connect: connection refused

$ ls -l $HOME/.local/share/zed
total 8
drwxr-xr-x  4 phil  staff  128 Oct  2 09:40 index_rules
-rw-r--r--  1 phil  staff   53 Oct  2 09:40 lake.zng
drwxr-xr-x  4 phil  staff  128 Oct  2 09:40 pools

$ zed -lake $HOME/.local/share/zed ls
Post "http://localhost:9867/query?ctrl=T": dial tcp [::1]:9867: connect: connection refused

And even if I set XDG_DATA_HOME to a specific path, the zed init is still the only command that seems to do what I'd expected.

@mattnibs
Copy link
Collaborator Author

mattnibs commented Oct 2, 2023

@philrz I marked it finished but it is not, I'm still finding bugs in it.

@mattnibs mattnibs force-pushed the default-datadir branch 2 times, most recently from 2d0db17 to b0e2d08 Compare October 2, 2023 17:01
@mattnibs
Copy link
Collaborator Author

mattnibs commented Oct 2, 2023

@philrz et al this pr is now ready for review.

@philrz

This comment was marked as resolved.

@mattnibs mattnibs force-pushed the default-datadir branch 3 times, most recently from 488ca20 to c6f9a57 Compare October 2, 2023 20:17
@mattnibs mattnibs requested a review from nwt October 2, 2023 20:20
If no lake location is specified, zed will use a per-os default
location. Also if the default location is used, zed will first check if
there's a service on 9867 and opt to connect to that instead of
operating locally.
cli/lakeflags/flags.go Outdated Show resolved Hide resolved
Comment on lines 62 to 68
func portInUse(port string) bool {
ln, err := net.Listen("tcp", port)
if err != nil {
return true
}
ln.Close()
return false
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to have a test for this but this test would be need to be restricted to something like a container environment where we can guarantee that port 9867 isn't being used by another instance of zed. We'll have to hold off for now, but I'll add this to my mental list of more difficult things we should test for in some future test setup that will include a long running test of compaction.

@mattnibs mattnibs requested a review from nwt October 4, 2023 17:14
cli/lakeflags/flags.go Outdated Show resolved Hide resolved
@mattnibs mattnibs merged commit e0f4839 into main Oct 10, 2023
3 checks passed
@mattnibs mattnibs deleted the default-datadir branch October 10, 2023 20:46
@philrz philrz linked an issue Oct 19, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default/shared filesystem paths
3 participants